Follow the instructions on the Hades website on how to set up R and how to install HADES.
While this tutorial does not use many of the Hades packages, these are often quite useful to have.
This tutorial does require DatabaseConnector and SqlRender from Hades, along with keyring, dplyr, kableExtra, DT and their dependencies.
These can be installed directly from CRAN.
We will use keyring to hide database credentials.
The following codes only needs to be executed once per machine.
keyring::key_set_with_value("strategusTutorialUser", password = "<FILL-IN>")
keyring::key_set_with_value("strategusTutorialPassword", password = "<FILL-IN>")
keyring::key_set_with_value("strategusTutorialServer", password = "<FILL-IN>")Note that keyring requires libsodium that is sometimes not available.
Alternatively, special values can be held in as environmental variables (in eg .Renviron) and accessed via Sys.getenv("strategusTutorialUser").
We hold Strategus results on the OHDSI public database server shinydb.
The web-page at https://bit.ly/strategus_data_model describes the data models for Strategus modules and some instructions on how to access results in these models.
Our database server runs postgresql.
You can install the postgresql JDBC drivers; this only needs to be done once.
It is recommend to specify the environmental variable DATABASECONNECT_JAR_FOLDER=<folder-of-your-choice> in .Renviron located in the user’s home directory.
In the example below, however, we set the environmental variable manually in R to make this document more self-contained.
Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = getwd())
DatabaseConnector::downloadJdbcDrivers(dbms = "postgresql")and set-up connection details, in this case using keyring to hide usernames and passwords for security`
# OHDSI shinydb read-only credentials
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = "postgresql",
server = keyring::key_get("strategusTutorialServer"),
user = keyring::key_get("strategusTutorialUser"),
password = keyring::key_get("strategusTutorialPassword"))Table 2.1 provides the schema names for some Strategus results.
| study | schema |
|---|---|
| GLP1RA vs DPP4I tutorial | strategus_tutorial |
| Fluoroquinolone and aortic aneurysms | quinoloneaa |
| Intravitreal anti-VEGF and kidney failure | antivegfkf |
We will use the Hades packages DatabaseConnector and SqlRender to query the results tables directly in R.
Many joins and column-select can be done either on the server-side (postgresql) or client-side (R).
We prefer the server-side when accessing large tables.
On the client-side, dplyr is effective.
Here is an example of a basic query to database_meta_data to return data-source names and abbreviations (with client-side select):
schema <- "strategus_tutorial"
connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)
sql <- SqlRender::render(sql = "
SELECT * FROM @schema.database_meta_data",
schema = schema) # parameterized
metaData <- DatabaseConnector::querySql(connection = connection,
sql = sql,
snakeCaseToCamelCase = TRUE)
DatabaseConnector::disconnect(connection)
names(metaData)## [1] "cdmSourceName" "cdmSourceAbbreviation"
## [3] "cdmHolder" "sourceDescription"
## [5] "sourceDocumentationReference" "cdmEtlReference"
## [7] "sourceReleaseDate" "cdmReleaseDate"
## [9] "cdmVersion" "cdmVersionConceptId"
## [11] "vocabularyVersion" "databaseId"
## [13] "maxObsPeriodEndDate"
Now is a good time to (manually) change the abbreviations in databaseName to consistent, similar-length, reader-friendly choices for our manuscript(s).
Likewise, we retrieve reader-friendly cohort names. Note that the following codes does not return negative control cohort names; these are not (?) currently stored in the results-set.
connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)
sql <- SqlRender::render(sql = "
SELECT DISTINCT cohort_id, cohort_name FROM @schema.cg_cohort_generation",
schema = schema)
cohortName <- DatabaseConnector::querySql(connection = connection,
sql = sql,
snakeCaseToCamelCase = TRUE)
DatabaseConnector::disconnect(connection)Often these names are unusable for reader-friendly documents.
See, e.g., cohortName[1, "cohortName"] which is [OHDSItutorial] GLP1RA exposures 60-day eras - in cohorts: (19022) starts within D: -99999 - D: 0 of cohort start and ends D: 0 - D: 99999 of cohort start, first ever occurence with at least 365 days prior observation and 1 days follow up observation, males, females, occurs after 2017-12-01 and before 2023-12-31.
We will re-code these names for prettier output.
Here we demonstrate DT::datatable that provides an interactive (HTML-only) table renderer useful for web-pages and presentations.
Here is an example of a basic query to a cohort_counts table to return cohort sizes.
We then output the counts using a well-formatted table
connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)
sql <- SqlRender::render(sql = "
SELECT database_id, cohort_id, cohort_subjects FROM @schema.cg_cohort_count",
schema = schema)
cohortCount <- DatabaseConnector::querySql(connection = connection,
sql = sql,
snakeCaseToCamelCase = TRUE) %>%
inner_join(databaseName, by = "databaseId") %>%
inner_join(cohortName, by = "cohortId") %>%
select(shortCohortName, shortDatabaseName, cohortSubjects) %>%
rename(cohortName = shortCohortName, databaseName = shortDatabaseName) %>%
arrange(cohortName, databaseName)
DatabaseConnector::disconnect(connection)The kableExtra package (especially when printed to PDF) makes beautiful tables.
table <- cohortCount %>% filter(cohortName %in% c("GLP1RA", "DPP4I")) %>%
rename(Cohort = cohortName, Database = databaseName, Subjects = cohortSubjects) %>%
mutate(Subjects = prettyNum(Subjects,
format = "d",
big.mark = ",")) %>% # Can easily change for _Lancet_
tidyr::spread(Cohort, Subjects)
tab <- table %>% kbl(align = "lrr") %>% kable_classic(full_width = F) %>%
add_header_above(c(" " = 1, "Exposure" = 2)) %>%
row_spec(0, align = "c")
if (knitr::is_latex_output()) {
tab %>% kable_styling(latex_options = "striped",
font_size = latex_table_font_size)
} else {
tab ## bootstrap-striped is not nicely compatible with `add_header_above`
}|
Exposure
|
||
|---|---|---|
| Database | DPP4I | GLP1RA |
| CCAE | 89,137 | 306,463 |
| JMDC | 82,051 | 15,381 |
| MDCR | 13,715 | 28,520 |
| OEHR | 279,475 | 459,266 |
Often it is most convenient to hand-craft latex table headers and footers and cat this material directly to the output.
Here is a silly example.
# Requires "results='asis'"
library(xtable)
cat(gsub("\\\\", "\\", r"{
\begin{table}[h]
\begin{center}
\begin{tabular}{lrr}
\hline
\multicolumn{1}{c}{$\alpha$} & \multicolumn{1}{c}{$\beta$} & \multicolumn{1}{c}{$\delta$} \\\\
}", fixed = TRUE))
print(xtable(table, format = "latex"),
include.rownames = FALSE, include.colnames = FALSE, only.contents = TRUE)
cat(gsub("\\\\", "\\", r"{
\end{tabular}
\end{center}
\caption{A silly caption with \LaTeX commands}
\end{table}
}", fixed = TRUE))Note that if one abhors escaping backslash-characters, then one can read the header and footer from text files.
CohortMethodUnfortunately, we currently do not have a shared repository of functions to generate typical results figures and tables.
Even OhdsiShinyModules hides these tools inside inaccessible shiny-functions.
Until we solve this limitation, we recommend duplicating code from previous public manuscripts.
The LegendT2dmEvidenceExplorer from LEGEND-T2DM is a good example: getCovariateBalance()
connection <- DatabaseConnector::connect(connectionDetails)
sql <- paste0("SET search_path TO ", schema, ";")
DatabaseConnector::executeSql(connection = connection, sql = sql)
sql <- "
SELECT DISTINCT target_id, comparator_id,
cm_analysis.analysis_id, cm_analysis.description
FROM cm_covariate_balance
INNER JOIN cm_analysis
ON cm_analysis.analysis_id = cm_covariate_balance.analysis_id"
analysis <- DatabaseConnector::querySql(connection = connection,
sql = sql,
snakeCaseToCamelCase = TRUE)
DatabaseConnector::disconnect(connection)
datatable(analysis %>% arrange(analysisId, targetId, comparatorId))Shamelessly (until these functions are exported) stolen from OhdsiStudyModules/R:
# Stolen from `OhdsiShinyModules`
getCovariateBalance <- function(connectionHandler,
resultDatabaseSettings,
targetId,
comparatorId,
analysisId,
databaseId = NULL) {
sql <- "
SELECT
cmscb.database_id,
cmc.covariate_name,
--cmc.analysis_id analysis_id,
cmscb.target_mean_before before_matching_mean_treated,
cmscb.comparator_mean_before before_matching_mean_comparator,
abs(cmscb.std_diff_before) abs_before_matching_std_diff, --absBeforeMatchingStdDiff
cmscb.target_mean_after after_matching_mean_treated,
cmscb.comparator_mean_after after_matching_mean_comparator,
abs(cmscb.std_diff_after) abs_after_matching_std_diff
FROM
@results_schema.@cm_table_prefixshared_covariate_balance cmscb
JOIN @results_schema.@cm_table_prefixcovariate cmc ON cmscb.covariate_id = cmc.covariate_id AND cmscb.analysis_id = cmc.analysis_id AND cmscb.database_id = cmc.database_id -- database_id optional
-- JOIN @results_schema.@cm_table_prefixcovariate_analysis cmca ON cmca.analysis_id = cmc.analysis_id -- question: shouldn't we have a covariate_analysis_id in @table_prefixcovariate table?
WHERE
cmscb.target_id = @target_id
AND cmscb.comparator_id = @comparator_id
AND cmscb.analysis_id = @analysis_id
--AND cmc.covariate_analysis_id = @covariate_analysis_id
AND cmscb.database_id = '@database_id'
"
sql <- SqlRender::render(sql,
target_id = targetId,
comparator_id = comparatorId,
database_id = databaseId,
results_schema = schema,
cm_table_prefix = "cm_",
analysis_id = analysisId)
result <- DatabaseConnector::querySql(connection, sql,
snakeCaseToCamelCase = TRUE)
return(result)
}connection <- DatabaseConnector::connect(connectionDetails)
balance <- getCovariateBalance(connection, schema,
targetId = 19137001, # GLP1RA
comparatorId = 19021001, # DPP4I
analysisId = 1,
databaseId = 1902274292) # CCAE
DatabaseConnector::disconnect(connection)
datatable(balance %>% select(-databaseId),
colnames = c("Name", "bT", "bC", "bSD", "aT", "aC", "aSD"),
rownames = NULL)